-
Notifications
You must be signed in to change notification settings - Fork 136
Making the trustymail PSL cache read-only in Lambda #215
Conversation
The changes to trustymail don't take effect they are made after trustymail.trustymail is imported.
trustymail.PublicSuffixListFilename = './public-suffix-list.txt' | ||
# Monkey patching trustymail to make the PSL cache read-only | ||
trustymail.PublicSuffixListReadOnly = True | ||
import trustymail.trustymail as tmail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move the import statements to within the code, instead of at the top of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The monkey patching can only happen here, since at the top of the file we don't know if we're running locally or in Lambda. The import trustymail.trustymail as tmail
statement has to happen after the monkey patching, since otherwise it doesn't appear to have any effect on tmail.scan
. (I spent quite a while debugging this very issue yesterday!)
I could put import trustymail
and monkey patch for running locally at the top of the file, then re-patch inside the function if we're running in Lambda. The import trustymail.trustymail as tmail
statement would still have to be inside the function though (since it has to come after the monkey patching), and I thought the code would be easiest to understand if I just kept everything together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it.
This is your scanner and I'm happy to see this merged to support the change, but I left a comment over at cisagov/trustymail#74 about whether a new flag is necessary for this purpose.
We could also look at aligning the placement of the PSL to be inside cache/
for the Lambda function, which would eliminate the need for special-case code inside the trustymail
scanner at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I responded to your comment over on cisagov/trustymail#74, but I'll include it here too:
The caching that
trustymail
uses (which is useful whentrustymail
is run standalone) also checks the PSL file to see if it's less than 24 hours old. So it would still try to update the Lambda PSL file if we didn't have the extra flag.
I agree that moving the PSL to be inside of the cache
directory would be a good change to make. I can add that to this pull request or make a separate one if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To copy my latest comment from that thread here too, for those reading:
With
pshtt
, I worked hard to ensure that we never had to download third party data, either from the internet or from S3. And what do you know, it looks like I also use somelambda
/local
special-case logic to do that inpshtt
. So I should probably drop my complaint about having any Lambda-specific logic in thetrustymail
scanner.I still think, just as advice to
trustymail
, that havingtrustymail
manage a flag discussing the state of the read only file system is exposing the wrong layer of abstraction. I recommend having a flag which disables the 24-hour cache feature, rather than a flag which relates to a low-level implementation detail that conflicts with that feature.
I also filed #219 to capture some comments around repackaging the PSL and other third party data sources in scanners.
@konklone, the CircleCI status seems to be hung up. I'm not sure what is going on with that. |
Ah, I found the cause - when I set up protected branches, it cached any status checks it saw in the last week, and when I marked circleci as required it kept that requirement (even if they were no longer set up to be run). I just switched the requirements to be the new ones in the protected branches screen: And #217 is now validating properly. |
This is necessary because the local filesystem when running in AWS Lambda is read-only.
The PSL cache file is also in a different location when running under Lambda, and this pull request takes care of that as well.